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Color quantization is an important operation with numerous applications 
in graphics and image processing. Most quantization methods are essentially 
based on data clustering algorithms. However, despite its popularity as a gen- 
eral purpose clustering algorithm, k-means has not received much respect in 
the color quantization literature because of its high computational require- 
ments and sensitivity to initialization. In this paper, a fast color quantization 
method based on k-means is presented. The method involves several modifica- 
tions to the conventional (batch) k-means algorithm including data reduction, 
sample weighting, and the use of triangle inequality to speed up the nearest 
neighbor search. Experiments on a diverse set of images demonstrate that, with 
the proposed modifications, k-means becomes very competitive with state-of- 
the-art color quantization methods in terms of both effectiveness and efficiency. 
©2010 Optical Society of America 

OCIS codes: 100.2000,100.5010 
1. Introduction 

True-color images typically contain thousands of colors, which makes their display, 
storage, transmission, and processing problematic. For this reason, color quantization 
(reduction) is commonly used as a preprocessing step for various graphics and image 
processing tasks. In the past, color quantization was a necessity due to the limita- 
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tions of the display hardware, which could not handle the 16 million possible colors 

in 24-bit images. Although 24-bit display hardware has become more common, color 
quantization still maintains its practical value p]. Modern applications of color quan- 
tization include: (i) image compression [2], (ii) image segmentation [3], (iii) image 
analysis [I], (iv) image watermarking [5], and (v) content-based image retrieval [6]. 

The process of color quantization is mainly comprised of two phases: palette design 
(the selection of a small set of colors that represents the original image colors) and 
pixel mapping (the assignment of each input pixel to one of the palette colors). The 
primary objective is to reduce the number of unique colors, N', in an image to K 
(K <C N') with minimal distortion. In most applications, 24-bit pixels in the original 
image are reduced to 8 bits or fewer. Since natural images often contain a large 
number of colors, faithful representation of these images with a limited size palette is 
a difficult problem. 

Color quantization methods can be broadly classified into two categories [7] : image- 
independent methods that determine a universal (fixed) palette without regard to any 
specific image [8] , and image-dependent methods that determine a custom (adaptive) 
palette based on the color distribution of the images. Despite being very fast, image- 
independent methods usually give poor results since they do not take into account the 
image contents. Therefore, most of the studies in the literature consider only image- 
dependent methods, which strive to achieve a better balance between computational 
efficiency and visual quality of the quantization output. 
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Numerous image-dependent color quantization methods have been developed in the 

past three decades. These can be categorized into two families: preclustering methods 
and postclustering methods pQ . Preclustering methods are mostly based on the statis- 
tical analysis of the color distribution of the images. Divisive preclustering methods 
start with a single cluster that contains all A" image pixels. This initial cluster is 
recursively subdivided until K clusters are obtained. Well-known divisive methods 
include median-cut [9], octree [10], variance-based method [IT], binary splitting [T2j . 
greedy orthogonal bipartitioning [13] , center-cut [U] , and rwm-cut [1_5] . More recent 
methods can be found in [T6T[T8] . On the other hand, agglomerative preclustering 
methods [T9H23] start with A" singleton clusters each of which contains one image 
pixel. These clusters are repeatedly merged until K clusters remain. In contrast to 
preclustering methods that compute the palette only once, postclutering methods 
first determine an initial palette and then improve it iteratively. Essentially, any data 
clustering method can be used for this purpose. Since these methods involve iterative 
or stochastic optimization, they can obtain higher quality results when compared 
to preclustering methods at the expense of increased computational time. Cluster- 
ing algorithms adapted to color quantization include k-means [241427] . minmax [28] . 
competitive learning [2TjH3"T] . fuzzy c- means [321ES], BIRCH [34] . and self-organizing 
maps [35H37] . 

In this paper, a fast color quantization method based on the k-means clustering 
algorithm [38] is presented. The method first reduces the amount of data to be clus- 
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tered by sampling only the pixels with unique colors. In order to incorporate the color 

distribution of the pixels into the clustering procedure, each color sample is assigned a 

weight proportional to its frequency. These weighted samples are then clustered using 

a fast and exact variant of the k-means algorithm. The set of final cluster centers is 

taken as the quantization palette. 

The rest of the paper is organized as follows. Section [2] describes the conventional 

k-means clustering algorithm and the proposed modifications. Section [3] describes the 

experimental setup and presents the comparison of the proposed method with other 

color quantization methods. Finally, Section [4] gives the conclusions. 

2. Color Quantization Using K-Means Clustering Algorithm 

The k-means (KM) algorithm is inarguably one of the most widely used methods 
for data clustering [39|. Given a data set X = {xi, . . . , x^} G IR D , the objective 
of KM is to partition X into K exhaustive and mutually exclusive clusters S = 
{Si, . . . , Sk} , UaLi $k = X, Si n Sj = for i ^ j by minimizing the sum of squared 
error (SSE): 



where, || || 2 denotes the Euclidean (L 2 ) norm and c k is the center of cluster Sk 
calculated as the mean of the points that belong to this cluster. This problem is 
known to be computationally intractable even for K = 2 [JO] , but a heuristic method 
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developed by Lloyd [H] offers a simple solution. Lloyd's algorithm starts with K 

arbitrary centers, typically chosen uniformly at random from the data points [4"2] . 

Each point is then assigned to the nearest center, and each center is recalculated as 

the mean of all points assigned to it. These two steps are repeated until a predefined 

termination criterion is met. The pseudocode for this procedure is given in Algo. (pQ) 

(bold symbols denote vectors). Here, m[i] denotes the membership of point Xj, i.e. 

index of the cluster center that is nearest to Xj. 

input : X = {x 1; . . . , Xtv} G IR d {N x D input data set) 
output: C = {ci, . . . , c K } G M D (K cluster centers) 
Select a random subset C of X as the initial set of cluster centers; 
while termination criterion is not met do 
for (z = 1; i < N; i = i + 1) do 

Assign Xj to the nearest cluster; 

m[i] = argmin ||xj — Cfc|| 2 ; 

ke{l,...,K} 

end 

Recalculate the cluster centers; 
for (Jfe = 1; k < K; k = k + 1) do 

Cluster Sk contains the set of points x, that are nearest to 

the center c^; 

S k = {xj \m[i] = k}; 

Calculate the new center as the mean of the points that 
belong to Sk] 

Cfc Xj, 

end 
end 

Algorithm 1: Conventional K-Means Algorithm 

When compared to the preclustering methods, there are two problems with using 
KM for color quantization. First, due to its iterative nature, the algorithm might 
require an excessive amount of time to obtain an acceptable output quality. Second, 



the output is quite sensitive to the initial choice of the cluster centers. In order to 
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address these problems, we propose several modifications to the conventional KM 

algorithm: 

• Data sampling: A straightforward way to speed up KM is to reduce the 
amount of data, which can be achieved by sampling the original image. Al- 
though random sampling can be used for this purpose, there are two problems 
with this approach. First, random sampling will further destabilize the clus- 
tering procedure in the sense that the output will be less predictable. Second, 
sampling rate will be an additional parameter that will have a significant impact 
on the output. In order to avoid these drawbacks, we propose a deterministic 
sampling strategy in which only the pixels with unique colors are sampled. 
The unique colors in an image can be determined efficiently using a hash ta- 
ble that uses chaining for collision resolution and a universal hash function of 
the form: h a (x) = (Yl"i=i a i x i) mod m, where x = (xi,X2,Xs) denotes a pixel 
with red (xi), green (£2), an d blue (x^) components, m is a prime number, 
and the elements of sequence a = (a±, a?, 03) are chosen randomly from the set 
{0,l,...,m-l}. 

• Sample weighting: An important disadvantage of the proposed sampling 
strategy is that it disregards the color distribution of the original image. In 
order to address this problem, each point is assigned a weight that is propor- 
tional to its frequency (note that the frequency information is collected during 
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the data sampling stage). The weights are normalized by the number of pixels 

in the image to avoid numerical instabilities in the calculations. In addition, 

Algo. (JTJ is modified to incorporate the weights in the clustering procedure. 

• Sort-Means algorithm: The assignment phase of KM involves many redun- 
dant distance calculations. In particular, for each point, the distances to each 
of the K cluster centers are calculated. Consider a point Xj, two cluster cen- 
ters c a and c& and a distance metric d, using the triangle inequality, we have 
d(c a , c&) < d(xj, c Q ) + d(xj, Cfc). Therefore, if we know that 2cZ(xj, c a ) < d(c a , cj,), 
we can conclude that rf(xj,c a ) < d(xj, c&) without having to calculate d(xj, c&). 
The compare-means algorithm [13] precalculates the pairwise distances between 
cluster centers at the beginning of each iteration. When searching for the nearest 
cluster center for each point, the algorithm often avoids a large number of dis- 
tance calculations with the help of the triangle inequality test. The sort-means 
(SM) algorithm [33] further reduces the number of distance calculations by sort- 
ing the distance values associated with each cluster center in ascending order. 
At each iteration, point x» is compared against the cluster centers in increasing 
order of distance from the center that Xj was assigned to in the previous 
iteration. If a center that is far enough from is reached, all of the remaining 
centers can be skipped and the procedure continues with the next point. In 
this way, SM avoids the overhead of going through all the centers. It should 
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be noted that more elaborate approaches to accelerate KM have been proposed 

in the literature. These include algorithms based on kd-trees [H], coresets |45j . 
and more sophisticated uses of the triangle inequality [16]. Some of these al- 
gorithms jlSlHS] are n °t suitable for low dimensional data sets such as color 
image data since they incur significant overhead to create and update auxiliary 
data structures [46j . Others [S] provide computational gains comparable to 
SM at the expense of significant conceptual and implementation complexity. In 
contrast, SM is conceptually simple, easy to implement, and incurs very small 
overhead, which makes it an ideal candidate for color clustering. 

We refer to the KM algorithm with the abovementioned modifications as the 
'Weighted Sort-Means' (WSM) algorithm. The pseudocode for WSM is given in Algo. 

©• 

3. Experimental Results and Discussion 

3. A. Image set and performance criteria 

The proposed method was tested on some of the most commonly used test images in 
the quantization literature. The natural images in the set included Airplane (512x512, 
77,041 (29%) unique colors), Baboon (512 x 512, 153,171 (58%) unique colors), Boats 
(787 x 576, 140,971 (31%) unique colors), Lenna (512 x 480, 56,164 (23%) unique 
colors), Parrots (1536 x 1024, 200,611 (13%) unique colors), and Peppers (512 x 512, 
111,344 (42%) unique colors). The synthetic images included Fish (300 x 200, 28,170 
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input : X = {x 1; . . . , Xtv'} £ M d (N' x D input data set) 

W = {w 1 , ...,w N r} e [0, 1] (A 7 point weights) 
output: C = {ci, . . . , c^} G M D (A cluster centers) 
Select a random subset C of A as the initial set of cluster centers; 
while termination criterion is not met do 

Calculate the pairwise distances between the cluster centers; 
for (z — 1; % < K\ % — % + 1) do 

for (j = i + 1; j <K;j =j + 1) do 



12. 



end 



end 

Construct a K x K matrix M in which row i is a permutation of 
1,...K that represents the clusters in increasing order of 
distance of their centers from c$; 
for (z = l;z < N';i = i+ 1) do 

Let S p be the cluster that was assigned to in the previous 
iteration; 



p = m[i\] 
min_dist 



prev_dist 



X 7 ' C r 



|2. 



Update the nearest center if necessary; 
for (j = 2; j < A; j =j + 1) do 

t = M[p][j]; 

if rf[p][t] > 4 prevjdist then 

There can be no other closer center. Stop checking; 
break; 
end 

dist = ||xj — c £ 1 1 2 ; 
if dist < miri-dist then 

c t is closer to x, than c p ; 
min_dist = dist; 
m[i] = t; 
end 
end 
end 

Recalculate the cluster centers; 

for (k = 1; k < K; k = k + 1) do 

Calculate the new center c& as the weighted mean 

of points that are nearest to it; 



Cfe = ( E w i*i \ E w i~, 

,m[i]=k J I m[i]=k 



end 



end 



Algorithm 2: Weighted Sort-Means Algorithm 
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(47%) unique colors) and Poolballs (510 x 383, 13,604 (7%) unique colors). 

The effectiveness of a quantization method was quantified by the Mean Squared 
Error (MSE) measure: 



the RGB color space. MSE represents the average distortion with respect to the L\ 
norm (TJJ) and is the most commonly used evaluation measure in the quantization 
literature [HE]. Note that the Peak Signal-to-Noise Ratio (PSNR) measure can be 
easily calculated from the MSE value: 



The efficiency of a quantization method was measured by CPU time in milliseconds. 
Note that only the palette generation phase was considered since this is the most time 
consuming part of the majority of quantization methods. All of the programs were 
implemented in the C language, compiled with the gcc v4.2.4 compiler, and executed 
on an Intel®Core™2 Quad Q6700 2.66GHz machine. The time figures were averaged 
over 100 runs. 

3.B. Comparison of WSM against other quantization methods 

The WSM algorithm was compared to some of the well-known quantization methods 
in the literature: 



MSE ( x - *) = m iL II. 11 x(ft - w) - *<*• w) »> 



(2) 



where X and X denote respectively the H x W original and quantized images in 




(3) 
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• Median-cut (MC) [H]: This method starts by building a 32 x 32 x 32 color 

histogram that contains the original pixel values reduced to 5 bits per channel 
by uniform quantization. This histogram volume is then recursively split into 
smaller boxes until K boxes are obtained. At each step, the box that contains 
the largest number of pixels is split along the longest axis at the median point, 
so that the resulting subboxes each contain approximately the same number of 
pixels. The centroids of the final K boxes are taken as the color palette. 

• Variance-based method (WAN) [11]: This method is similar to MC, with 
the exception that at each step the box with the largest weighted variance 
(squared error) is split along the major (principal) axis at the point that mini- 
mizes the marginal squared error. 

• Greedy orthogonal bipartitioning (WU) [13]: This method is similar to 
WAN, with the exception that at each step the box with the largest weighted 
variance is split along the axis that minimizes the sum of the variances on both 
sides. 

• Neu-quant (NEU) [35]: This method utilizes a one-dimensional self- 
organizing map (Kohonen neural network) with 256 neurons. A random subset 
of N/ f pixels is used in the training phase and the final weights of the neurons 
are taken as the color palette. In the experiments, the highest quality configu- 
ration, i.e. / = 1, was used. 
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• Modified minmax [28] : This method choses the first center Ci ar- 
bitrarily from the data set and the i-th center Cj (i — 2, . . . , K) is chosen to be 
the point that has the largest minimum weighted L\ distance (the weights for 
the red, green, and blue channels are taken as 0.5, 1.0, and 0.25, respectively) to 
the previously selected centers, i.e. c 1; c 2 , . . . , Cj_i. Each of these initial centers 
is then recalculated as the mean of the points assigned to it. 

• Split & Merge (SAM) [23]: This two-phase method first divides the color 
space uniformly into B partitions. This initial set of B clusters is represented 
as an adjacency graph. In the second phase, (B — K) merge operations are 
performed to obtain the final K clusters. At each step of the second phase, the 
pair of clusters with the minimum joint quantization error are merged. In the 
experiments, the initial number of clusters was set to B — 20K. 

• Fuzzy c-means (FCM) [47] : FCM is a generalization of KM in which 
points can belong to more than one cluster. The algorithm involves the min- 
imization of the functional J q (U, V) = J2iLi J2k=i u ik ll x « — v fcll2 with re- 
spect to U (a fuzzy if-partition of the data set) and V (a set of proto- 
types - cluster centers). The parameter q controls the fuzziness of the re- 
sulting clusters. At each iteration, the membership matrix U is updated by 
Uik = (j2f=i (ll x i ~ v fc|| 2 /ll x i - v jll 2 ) 2/(9 X) ) ' wm ch is followed by the up- 



date of the prototype matrix V by v fc = ( J2i=i u \k^ ) / ( J2i=i u \k ) ■ A naive 
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implementation of the FCM algorithm has a complexity that is quadratic in K. 

In the experiments, a linear complexity formulation described in [IB] was used 

and the fuzziness parameter was set to q = 2 as commonly seen in the fuzzy 

clustering literature [39] . 

• Fuzzy c-means with partition index maximization (PIM) [52]: This 
method is an extension of FCM in which the functional to be minimized in- 
corporates a cluster validity measure called the 'partition index' (PI). This 
index measures how well a point x, has been classified and is defined as 
p i = Efc=i M L The FCM functional can be modified to incorporate PI as 
follows: Jg(U, V) = J2i=i Ylk=i u ik ll x * ~~ v fcll2 ~ a Yld=iPi- The parameter a 
controls the weight of the second term. The procedure that minimizes Jq{U, V) 
is identical to the one used in FCM except for the membership matrix up- 
date equation: u ik = (j2f=i [(ll x * ~ v fc|l2 ~ a ) I (ll x i ~ v ill2 ~ Oi)] 2, ^ q 1} j . An 
adaptive method to determine the value of a is to set it to a fraction < 5 < 0.5 
of the distance between the nearest two centers, i.e. a = <5min ||vj — VjH?. Fol- 
lowing [32], the fraction value was set to 5 = 0.4. 

• Finite-state k-means (FKM) [25]: This method is a fast approximation for 
KM. The first iteration is the same as that of KM. In each of the subsequent 
iterations, the nearest center for a point Xj is determined from among the K' 
[K 1 <ti K) nearest neighbors of the center that the point was assigned to in 
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the previous iteration. When compared to KM, this technique leads to consid- 
erable computational savings since the nearest center search is performed in a 
significantly smaller set of K' centers rather than the entire set of K centers. 
Following [25], the number of nearest neighbors was set to K' = 8. 

• Stable-flags k-means (SKM) [2S]: This method is another fast approxima- 
tion for KM. The first I' iterations are the same as those of KM. In the sub- 
sequent iterations, the clustering procedure is accelerated using the concepts 
of center stability and point activity. More specifically, if a cluster center 
does not move by more than 9 units (as measured by the L\ distance) in two 
successive iterations, this center is classified as stable. Furthermore, points that 
were previously assigned to the stable centers are classified as inactive. At each 
iteration, only unstable centers and active points participate in the clustering 
procedure. Following [26], the algorithm parameters were set to /' = 10 and 
6 = 1.0. 

For each KM-based quantization method (except for SKM), two variants were im- 
plemented. In the first one, the number of iterations was limited to 10, which makes 
this variant suitable for time-critical applications. These fixed-iteration variants are 
denoted by the plain acronyms KM, FKM, and WSM. In the second variant, to obtain 
higher quality results, the method was executed until it converged. Convergence was 
determined by the following commonly used criterion [38]: (SSEj_i — SSE^/SSEj < e, 
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where SSEj denotes the SSE flTJ value at the end of the i-th iteration. Following pZSpE] . 

the convergence threshold was set to e — 0.0001. The convergent variants of KM, 

FKM, and WSM are denoted by KM-C, FKM-C, and WSM-C, respectively. Note 

that since SKM involves at least I' = 10 iterations, only the convergent variant was 

implemented for this method. As for the fuzzy quantization methods, i.e. FCM and 

PIM, due to their excessive computational requirements, the number of iterations for 

these methods was limited to 10. 

Tables [IH2] compare the performance of the methods at quantization levels K = 

{32, 64, 128, 256} on the test images. Note that, for computational simplicity, random 

initialization was used in the implementations of FCM, PIM, KM, KM-C, FKM, 

FKM-C, SKM, WSM, and WSM-C. Therefore, in Table [U the quantization errors 

for these methods are specified in the form of mean (/i) and standard deviation (cr) 

over 100 runs. The best (lowest) error values are shown in bold. In addition, with 

respect to each performance criterion, the methods are ranked based on their mean 

values over the test images. Table [3] gives the mean ranks of the methods. The last 

column gives the overall mean ranks with the assumption that each criterion has 

equal importance. Note that the best possible rank is 1. The following observations 

are in order: 

> In general, the postclustering methods are more effective but less efficient when 
compared to the preclustering methods. 
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> With respect to distortion minimization, WSM-C outperforms the other meth- 
ods by a large margin. This method obtains an MSE rank of 1.06, which means 
that it almost always obtains the lowest distortion. 

> WSM obtains a significantly better MSE rank than its fixed-iteration rivals. 

> Overall, WSM and WSM-C are the best methods. 

> In general, the fastest method is MC, which is followed by SAM, WAN, and 
WU. The slowest methods are KM-C, FCM, PIM, FKM-C, KM, and SKM. 

> WSM-C is significantly faster than its convergent rivals. In particular, it provides 
up to 392 times speed up over KM-C with an average of 62. 

> WSM is the fastest post-clustering method. It provides up to 46 times speed up 
over KM with an average of 14. 

> KM-C, FKM-C, and WSM-C are significantly more stable (particularly when 
K is small) than their fixed-iteration counterparts as evidenced by their low 
standard deviation values in Table [TJ This was expected since these methods 
were allowed to run longer which helped them overcome potentially adverse 
initial conditions. 

Table H] gives the mean stability ranks of the methods that involve random initial- 
ization. Given a test image and K value combination, the stability of a method is 
calculated based on the coefficient of variation (cr//i) as: 100(1 — cr/fJ,), where /i and 
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a denote the mean and standard deviation over 100 runs, respectively. Note that the 

fj, and u values are given in Table [TJ Clearly, the higher the stability of a method the 

better. For example, when K = 32, WSM-C obtains a mean MSE of 57.461492 with 

a standard deviation of 0.861126 on the Airplane image. Therefore, the stability of 

WSM-C in this case is calculated as 100(1 - 0.861126/57.461492) = 98.50%. It can 

be seen that WSM-C is the most stable method, whereas WSM is the most stable 

fixed-iteration method. 

Figure [U shows sample quantization results and the corresponding error images. 
The error image for a particular quantization method was obtained by taking the 
pixelwise absolute difference between the original and quantized images. In order to 
obtain a better visualization, pixel values of the error images were multiplied by 4 
and then negated. It can be seen that WSM-C and WSM obtain visually pleasing 
results with less prominent contouring. Furthermore, they achieve the highest color 
fidelity which is evident by the clean error images that they produce. 

Figure [2] illustrates the scaling behavior of WSM with respect to K . It can be seen 
that the complexity of WSM is sublinear in K, which is due to the intelligent use 
of the triangle inequality that avoids many distance computations once the cluster 
centers stabilize after a few iterations. For example, on the Parrots image, increasing 
K from 2 to 256, results in only about 3.67 fold increase in the computational time 
(172 ms. vs. 630 ms.). 




(a) MMM output (b) MMM error (c) NEU output (d) NEU error 
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(e) WSM output (f) WSM error (g) WSM-C output (h) WSM-C error 



Fig. 1. Sample quantization results for the Airplane image (K=32) 




Fig. 2. CPU time for WSM for K = {2,..., 256} 
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We should also mention two other KM-based quantization methods [241 |2"T] . As in 

the case of FKM and SKM, these methods aim to accelerate KM without degrading 

its effectiveness. However, they do not address the stability problems of KM and thus 

provide almost the same results in terms of quality. In contrast, WSM (WSM-C) not 

only provides considerable speed up over KM (KM-C), but also gives significantly 

better results especially at lower quantization levels. 

4. Conclusions 

In this paper, a fast and effective color quantization method called WSM (Weighted 
Sort-Means) was introduced. The method involves several modifications to the con- 
ventional k-means algorithm including data reduction, sample weighting, and the use 
of triangle inequality to speed up the nearest neighbor search. Two variants of WSM 
were implemented. Although both have very reasonable computational requirements, 
the fixed-iteration variant is more appropriate for time-critical applications, while the 
convergent variant should be preferred in applications where obtaining the highest 
output quality is of prime importance, or the number of quantization levels or the 
number of unique colors in the original image is small. Experiments on a diverse set 
of images demonstrated that the two variants of WSM outperform state-of-the-art 
quantization methods with respect to distortion minimization. Future work will be 
directed toward the development of a more effective initialization method for WSM. 
The implementation of WSM will be made publicly available as part of the Fourier 
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image processing and analysis library, which can be downloaded from http : / / 
sourcef orge . net/pro j ect s/f our ier- ipal . 
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Table 1. MSE comparison of the quantization methods 
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Table 2. CPU time comparison of the quantization methods 
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Table 3. Performance rank comparison of the quantization methods 
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Table 4. Stability rank comparison of the quantization methods 
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